涉及环境声音分析的音频应用越来越多地使用通用音频表示(也称为嵌入)进行转移学习。最近,对音频表示形式(HEAR)的整体评估评估了关于19个不同任务的29个嵌入模型。但是,评估的有效性取决于给定数据集中已经捕获的变化。因此,对于给定的数据域,尚不清楚表示形式如何受到由无数麦克风范围和声学条件引起的变化的影响 - 通常称为通道效应。我们的目标是扩展听力,以评估不变性以在这项工作中的渠道效果。为此,我们通过向音频信号注入扰动来模仿通道效应,并用三个距离测量方法测量新(扰动)嵌入的变化,从而使评估域依赖但不依赖于任务依赖性。结合下游性能,它有助于我们对嵌入方式对频道效果的鲁棒性进行更明智的预测。我们评估了两个嵌入 - Yamnet和OpenL3在单声道(Urbansound8K)和多音(Sonyc-ust)Urban数据集上。我们表明,在这种无关的评估中,一个距离度量不足。尽管FR \'Echet音频距离(FAD)与下游任务中的性能下降趋势相关,但我们表明我们需要与其他距离一起研究时尚,以清楚地了解对该时尚的整体效果扰动。就嵌入性能而言,我们发现OpenL3比Yamnet更强大,Yamnet与听觉评估保持一致。
translated by 谷歌翻译
从未标记数据的代表学习一直是对人工智能研究的重大兴趣。虽然自我监督的言语代表学习在语音研究界受欢迎,但很少有效地对非语音音频任务进行了全面分析了音频表示学习。在本文中,我们提出了一种自我监督的音频表示学习方法,并将其应用于各种下游非语音音频任务。我们将众所周知的Wav2Vec 2.0框架结合起来,这在用于语音任务的自我监督学习中取得了成功,具有参数效率的构装体系结构。我们的自我监督的预培训可以减少三分之二的标记数据的需求。在Audioset基准测试中,我们达到平均平均精度(地图)得分为0.415,这是通过仅限音频自我监督的学习在此数据集上的新型最先进的。我们的微调符合子也超越了在几个下游任务上以监督方式预先培训的先前系统的性能。我们进一步讨论了预先培训和微调的重要设计考虑因素。
translated by 谷歌翻译
Psychology research has long explored aspects of human personality such as extroversion, agreeableness and emotional stability. Categorizations like the `Big Five' personality traits are commonly used to assess and diagnose personality types. In this work, we explore the question of whether the perceived personality in language models is exhibited consistently in their language generation. For example, is a language model such as GPT2 likely to respond in a consistent way if asked to go out to a party? We also investigate whether such personality traits can be controlled. We show that when provided different types of contexts (such as personality descriptions, or answers to diagnostic questions about personality traits), language models such as BERT and GPT2 can consistently identify and reflect personality markers in those contexts. This behavior illustrates an ability to be manipulated in a highly predictable way, and frames them as tools for identifying personality traits and controlling personas in applications such as dialog systems. We also contribute a crowd-sourced data-set of personality descriptions of human subjects paired with their `Big Five' personality assessment data, and a data-set of personality descriptions collated from Reddit.
translated by 谷歌翻译
Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction, but the main LM benchmarks are non-interactive, where a system produces output without human intervention. To evaluate human-LM interaction, we develop a framework, Human-AI Language-based Interaction Evaluation (H-LINE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.
translated by 谷歌翻译
Bike sharing systems often suffer from poor capacity management as a result of variable demand. These bike sharing systems would benefit from models to predict demand in order to moderate the number of bikes stored at each station. In this paper, we attempt to apply a graph neural network model to predict bike demand in the New York City, Citi Bike dataset.
translated by 谷歌翻译
A hallmark of human intelligence is the ability to learn new concepts purely from language. Several recent approaches have explored training machine learning models via natural language supervision. However, these approaches fall short in leveraging linguistic quantifiers (such as 'always' or 'rarely') and mimicking humans in compositionally learning complex tasks. Here, we present LaSQuE, a method that can learn zero-shot classifiers from language explanations by using three new strategies - (1) modeling the semantics of linguistic quantifiers in explanations (including exploiting ordinal strength relationships, such as 'always' > 'likely'), (2) aggregating information from multiple explanations using an attention-based mechanism, and (3) model training via curriculum learning. With these strategies, LaSQuE outperforms prior work, showing an absolute gain of up to 7% in generalizing to unseen real-world classification tasks.
translated by 谷歌翻译
Large Language Models (LLMs) have been the subject of active research, significantly advancing the field of Natural Language Processing (NLP). From BERT to BLOOM, LLMs have surpassed state-of-the-art results in various natural language tasks such as question answering, summarization, and text generation. Many ongoing efforts focus on understanding LLMs' capabilities, including their knowledge of the world, syntax, and semantics. However, extending the textual prowess of LLMs to symbolic reasoning has been slow and predominantly focused on tackling problems related to the mathematical field. In this paper, we explore the use of LLMs for automated planning - a branch of AI concerned with the realization of action sequences (plans) to achieve a goal, typically executed by intelligent agents, autonomous robots, and unmanned vehicles. We introduce Plansformer; an LLM fine-tuned on planning problems and capable of generating plans with favorable behavior in terms of correctness and length with reduced knowledge-engineering efforts. We also demonstrate the adaptability of Plansformer in solving different planning domains with varying complexities, owing to the transfer learning abilities of LLMs. For one configuration of Plansformer, we achieve ~97% valid plans, out of which ~95% are optimal for Towers of Hanoi - a puzzle-solving domain.
translated by 谷歌翻译
Chatbots, or bots for short, are multi-modal collaborative assistants that can help people complete useful tasks. Usually, when chatbots are referenced in connection with elections, they often draw negative reactions due to the fear of mis-information and hacking. Instead, in this paper, we explore how chatbots may be used to promote voter participation in vulnerable segments of society like senior citizens and first-time voters. In particular, we build a system that amplifies official information while personalizing it to users' unique needs transparently. We discuss its design, build prototypes with frequently asked questions (FAQ) election information for two US states that are low on an ease-of-voting scale, and report on its initial evaluation in a focus group. Our approach can be a win-win for voters, election agencies trying to fulfill their mandate and democracy at large.
translated by 谷歌翻译
This paper presents a new approach for analyzing and identifying potentially useful generalized plans. It presents a new conceptual framework along with an algorithmic process for assessing termination and reachability related properties of generalized plans. The presented framework builds upon classic results on the analysis of graphs to decompose generalized plans into smaller components in a novel algorithm for conducting a hierarchical analysis for termination of arbitrary generalized plans. Theoretical analysis of the new framework establishes soundness of the presented algorithms and shows how it goes beyond existing approaches; empirical analysis illustrates the scope of this approach. Our analysis shows that this new approach can effectively identify termination for a significantly larger class of generalized plans than was possible using existing methods.
translated by 谷歌翻译
The primary obstacle to developing technologies for low-resource languages is the lack of representative, usable data. In this paper, we report the deployment of technology-driven data collection methods for creating a corpus of more than 60,000 translations from Hindi to Gondi, a low-resource vulnerable language spoken by around 2.3 million tribal people in south and central India. During this process, we help expand information access in Gondi across 2 different dimensions (a) The creation of linguistic resources that can be used by the community, such as a dictionary, children's stories, Gondi translations from multiple sources and an Interactive Voice Response (IVR) based mass awareness platform; (b) Enabling its use in the digital domain by developing a Hindi-Gondi machine translation model, which is compressed by nearly 4 times to enable it's edge deployment on low-resource edge devices and in areas of little to no internet connectivity. We also present preliminary evaluations of utilizing the developed machine translation model to provide assistance to volunteers who are involved in collecting more data for the target language. Through these interventions, we not only created a refined and evaluated corpus of 26,240 Hindi-Gondi translations that was used for building the translation model but also engaged nearly 850 community members who can help take Gondi onto the internet.
translated by 谷歌翻译